Genetic structure and origin of emu populations in Japanese farms inferred from large-scale SNP genotyping based on double-digest RAD-seq

The emu is a novel poultry species in Japan. However, Japanese farmed emu populations have reduced genetic diversity owing to inbreeding. We have previously suggested that there are genetic resources in the Tohoku Safari Park (TSP) and Fuji/Kakegawa Kachoen Garden Park (FGP/KGP) to extend the genetic diversity of commercial emu farms based on microsatellite (SSR) and mitochondrial DNA. However, those markers provide relatively poor information. Thus, we investigated the genetic structure of farmed Japanese populations based on a large-scale genotyping system using RAD-seq and verified the usefulness of TSP and FGP/KGP as genetic resources for expanding genetic diversity. Admixture, phylogenetic, and principal component analyses based on 28,676 SNPs showed that TSP individuals were ancestors in the Okhotsk Emu Farm (OEF). FGP/KGP individuals showed a unique genetic component that differed from that of the others. We have previously reported that the mitochondrial haplotypes of FGP/KGP were shared with an isolated wild population in eastern Australia. These results suggest that FGP/KGP individuals originated from an eastern Australia isolated population different from other populations including ancestral of OEF/TSP. Our results would provide information for the development of Japanese emu farms and industry and for the conservation of genetic resources in the Australian wild emu.


Sample
In a previous study, we suggested that TSP and FGP/KGP are genetically different from OEF based on analyses using SSR markers and mtDNA haplotypes 5 .Therefore, we used samples derived from the same populations as in our previous studies and included a large number of individuals with OEF as controls.
Liver tissues were collected from 95 and 100 birds slaughtered in 2016 (OEF2016) and 2017 (OEF2017) at OEF, respectively.Feather pulp samples (n = 58) were randomly collected from 19, 25, and 14 individuals in the TSP, FGP, and KGP groups, respectively.Genomic DNA (n = 253) was isolated from liver tissues and feather pulp using ISOGENOME (Nippon Gene, Tokyo, Japan) according to the manufacturer's instructions.
All procedures involving animals met the guidelines described in "The Proper Conduct of Animal Experiments, " proposed by the Science Council of Japan and approved by the Ethical Care and Use of Animals Committee at the Tokyo University of Agriculture (approval number: 270049, 280,002, 290,096, 300,126, 2,019,109, 2,022,132, and 2,023,156).The study was carried out in compliance with the ARRIVE guidelines.

SNP detection and evaluation of genetic diversity
SNPs were detected using the ref-map.plscript in Stacks ver.2.61 24 with − r 0.5 and − p 5 options from each mapping data set.The number of polymorphic loci, number of private alleles, expected heterozygosity (H E ), observed heterozygosity (H O ), and F IS values were calculated using populations software in the Stacks script.Parentage analysis for the detection of sibships and the inbreeding effect was visualized using Colony2 software 25 (https:// www.zsl.org/ about-zsl/ resou rces/ softw are/ colony).

Construction of phylogenetic tree
The genetic distance (F ST ) among farms was calculated using Populations 1.2.32 software (https:// bioin forma tics.org/ ~tryph on/ popul ations/) with a genepop-format file outputted by the populations software in Stacks.The genetic distances were computed among all individuals and phylogenetic trees were constructed using BEAST2 ver.

Admixture analysis
The most likely number of ancestors were predicted based on the PLINK format file using ADMIXTURE ver.1.3.0 26 (https:// dalex ander.github.io/ admix ture/ index.html) under default conditions.The bar plot was exported using the Pophelper and ggplot2 packages in the R software.

Principal coordinate analysis (PCoA) with tSNE plot
The genotype data formatted by the VCF file were manipulated using the vcfR package in R software.After PCoA was performed for up to 50 dimensions based on the loaded SNP information, dimensionality reduction was performed to calculate the t-distributed stochastic neighbor embedding (tSNE).The tSNE plot was drawn using the ggplot2 package in R software.

SNP detection and genetic diversity
In five populations, 3,166,876 loci were commonly mapped to the reference sequence, with 28,676 SNPs detected at a rate of over 50% of samples in each population.Among these, 5198, 22,271, 25,484, 13,469, and 6825 polymorphic sites were detected in TSP, OEF2016, OEF2017, FGP, and KGP, respectively (Table 1).The polymorphic rates of the 3,166,876 mapped reads were 0.16%, 0.70%, 0.80%, 0.43%, and 0.22% for TSP, OEF2016, OEF2017, FGP, and KGP, respectively.The number of private alleles were 69, 1260, 3259, 636, and 154 in TSP, OEF2016, OEF2017, FGP, and KGP, respectively.Therefore, OEF possessed higher genetic variation than the other populations, in contrast to the lower values in the TSP and KGP populations.The expected and observed heterozygosities (H E /H O ) of TSP, OEF2016, OEF2017, FGP, and KGP showed low values, which were 0.025/0.018,0.035/0.028,0.040/0.034,0.039/0.029,and 0.033/0.024,respectively.The inbreeding coefficients (F IS ) of the TSP, OEF2016, OEF2017, FGP, and KGP were 0.320, 0.198, 0.175, 0.237, and 0.305, respectively.These results suggest that all tested Japanese farmed populations tend to inbreed and that TSP is in the most profound condition, consistent with our previous reports 5,27 .Pedigree analysis using Colony2 software showed that there were three full-sib pairs and 69 half-sibs in the analyzed 253 individuals (Fig. S2).Among them, 3 full-sibs and 53 half-sibs were found in the TSP, indicating that the inbreeding of this population progressed.

Phylogenetic trees
Genetic distance (F ST ) among the populations ranged from 0.004 to 0.037 (Table S2).The lowest and highest F ST values were observed between OEF2016 and OEF2017 (0.004) and between KGP and TSP (0.037), respectively.The phylogenetic tree based on F ST showed that FGP and KGP formed a clade with a relatively high genetic distance between them, which was distinct from the clade composed of OEF2016, OEF2017, and TSP (Fig. 1A).
The phylogenetic tree, based on the genetic distance among 253 individuals derived from all tested populations, showed four clusters constructed by TSP, FGP/KGP, and separated OEFs (Fig. 1B).In the cladogram shown in Fig. 1B, the lines were more clearly drawn for reliable divergences than for uncertain divergences.FGP and KGP were predicted to be separate from the common ancestors of all the tested Japanese emu farm populations in the early stages.A few individuals derived from the OEF clustered into a clade different from many other members and had a closed relationship with the TSP population.These results suggest that the origin of FGP/KGP may be from a different Australian population than the others in Japan, and OEF is composed of two divergent groups, one of which shares common ancestors with a few individuals in the TSP.

Admixture analysis
To estimate genetic variability within and divergence among populations, structural analysis was performed using ADMIXTURE software.At K = 2, the genetic structure showed that TSP and FGP/KGP had similar components, which were distinct from those of OEF (Fig. 2A, upper).In K = 3, although TSP and FGP/KGP were predominantly represented in monochrome, respectively, as they were K = 2, they exhibited distinct genetic components between them (Fig. 2A, middle).OEF has a complex genetic structure, including the TSP and a few FGP/KGP components.At K = 4, TSP was completely represented by a single color (Adm1), and OEF had all the genetic components derived from TSP and FGP/KGP (Adm2), in addition to their own unique components (Fig. 2A,

PCoA with tSNE plot
The t-distributed stochastic neighbor embedding (tSNE) plot based on SNP genotypes grouped the OEF, TSP, and FGP/KGP populations, except for a few individuals (Fig. 2B).Similar to the results shown in Fig. 1, OEF was widely distributed on these reduced dimensions, indicating that it had various genetic variations compared with the other populations.Almost all the KGP individuals overlapped with the FGP group.The distribution of tSNE in the OEF and FGP groups was completely separated, and they were widely distant.In contrast, the distribution of a few individuals from the TSP and KGP groups overlapped with that of the individuals from the OEF group.A large proportion of TSP individuals were collectively distributed adjacent to the FGP/KGP group.
Next, we integrated the results from the admixture analysis and the tSNE plot, as shown in Fig. 2C.The pie charts of each individual assigned by the admixture analysis were mapped onto the tSNE plot to visualize the genetic structure and genetic distance among all samples (Fig. 2C).Consistent with the above results, the FGP (ice blue colored frame) and KGP (blue colored frame) groups formed distinct clusters from the OEF group (gray-and yellow-colored frames), and their genetic components were composed of only Adm2 (green colored pie chart), which was not observed in the other populations.The OEFs were distributed into three clusters (Adm1, Adm3, and Adm4) when the putative number of ancestors (K) was four in the admixture analysis (Fig. 2A).The scatter tSNE plot-integrated clustering information obtained from the admixture analysis showed that OEF was widely spread from coordinates approximately − 25 in tSNE-1 to 25 in tSNE-2, with multiple clusters for each genetic component.The OEF population comprises three subgroups: Adm1 (represented by an orange pie chart), Adm3 (depicted as an ice blue pie chart), and Adm4 (represented by a purple pie chart).These subgroups were derived from various ancestors and exhibited a significant spread on the tSNE plot, suggesting a greater degree of genetic diversity than other populations.
The Adm1, which was occupied by individuals derived from TSP when K = 4, displayed distinct distributions near the KGP/FGP clusters and within the OEF cluster on coordinates (− 20.0, 21.4) and (− 3.7, − 1.7) in the tSNE-1 and tSNE-2 dimension, respectively.Some individuals from the TSP exhibited overlapping plots within the OEF population.Therefore, it can be inferred that some TSP individuals share common ancestral origins with the OEF population.

Genetic diversity of Japanese farmed populations
In our previous studies, Japanese farmed emu populations were clustered into genetically distinct groups based on SSR marker and mtDNA analyses 5,12,13 .The TSP and FGP/KGP populations exhibited genetic components distinct from the OEF population.Therefore, we previously suggested that TSP and FGP/KGP should be constructed from specific ancestral populations.To verify our previous hypothesis, in the present study, we investigated the genetic structure among those populations with OEF based on a large-scale SNP genotyping by RAD-seq analysis, which has a higher resolution compared to that of SSRs and mtDNA.
The number of polymorphic loci that could be mapped to the reference genome was lower in the TSP and KGP populations than that in the other populations (Table 1).In addition, a smaller number of private alleles was observed in TSP/KGP.It is possible that genetic diversity in these populations is low.Because TSP and KGP were composed of a small number of individuals, these results might be derived from a limited number of founders or caused by the reduction of minor alleles due to inbreeding and/or genetic drift, consistent with our previous studies using SSR markers 5,12 .Alternatively, the small number of private alleles detected in TSP/ KGP could be explained by their small population sizes.Therefore, we cannot conclude that genetic diversity in lower in TSP/KGP than in OEF.

Genetic relationships among Japanese farmed populations
In the phylogenetic tree based on the genetic distance among the four populations, FGP and KGP independently diverged from TSP/OEF (Fig. 1A), which is consistent with previous studies 5,12,13 .Unlike previous studies, the TSP population showed a close relationship with the OEF population 5 .The cladogram constructed using the genotypes from 253 individuals also showed that the clade of the FGP/KGP group was clearly separated from that of the other populations (Fig. 1B).Interestingly, FGP and KGP were separated from the other populations before the divergence between TSP and OEF.This result indicates that some TSP individuals were derived from common ancestors to those of OEF, unlike FGP and KGP.Hence, we propose that the sources of the FGP and KGP populations were different from those of the Australian population from the OEF/TSP.Therefore, genetic differentiation observed in these populations may not be attributable to genetic drift or inbreeding.The Japanese farmed populations were largely divided into three clusters, represented by the OEF, TSP, and FGP/KGP groups, with K = 3 and 4 as the putative number of ancestors (Fig. 2A).In K = 4, although TSP and FGP/KGP showed a single cluster, OEF was composed of multiple genetic components.These results suggest that OEF is an admixed population with multiple ancestors and thus maintains a higher genetic diversity than the other three populations composed of a single cluster.Meanwhile, microsatellite analysis in our previous study showed that the OEF population had a "moderate number of alleles, " which was higher than that of TSP, FGP, and KGP, supporting the findings of the present study.
Recently, genome-wide genotyping systems, including RAD-seq, have revealed the genetic relationships among populations and hidden genetic structures in many wild organisms and livestock [29][30][31] .Using RAD-seq, we successfully identified genetic divergence among populations and discovered rare genetic components in the Japanese emu.Thus, this study further supports the utility of RAD-seq-based for population genetics analyses of farmed animals.

Putative origins of Japanese farmed populations
A large proportion of individuals derived from FGP and KGP showed overlapping plots in the tSNE analysis (Fig. 2B).The tSNE scatter plot and admixture analysis also indicated that FGP/KGP have common ancestors, which is consistent with comments from breeders in these populations.In the tSNE plot, the KGP group appeared to be more tightly clustered than the FGP group, reflecting a lower rate of private alleles and pi in KGP than those in FGP (Table 1).Conversely, FGP was predicted to maintain a higher genetic diversity than KGP, even though they exhibited similar population sizes, reflecting widespread plotting on the tSNE in FGP individuals.However, it is unknown why FGP maintain higher genetic diversity than KGP despite similar population sizes and identical sources.
Davis et al. 28 revealed that a wild emu population on the NSW North Coast (NNC) was in reproductive isolation from other populations and suggested that this was caused by geographic isolation due to the presence of the Great Dividing Range.The NNC population showed clear differentiation from the others in the STRU CTU RE analysis in their study, and the predominant mtDNA haplotype was H8, which has rarely been observed in other populations 28 .To predict the origin of farmed Japanese populations, our previous results were compared with those reported by Davis et al. (2022).The Hap-d corresponded to H8 28 and is the most frequently detected haplotype in FGP/KGP 5 and is clearly differentiated from other Japanese populations (Figs. 1 and 2).These facts imply that the primary source of FGP/KGP was the NNC, a wild population in eastern Australia.However, because mtDNA is transmitted by maternal inheritance, it is necessary to compare SNP profiles between FGP/ KGP and NNC to verify our hypothesis.
Most TSP individuals were aggregately plotted at an intermediate position between the OEF and FGP groups (Fig. 2B,C).To understand the detailed genetic relationship between TSP and other populations, we generated a 3D scatter plot depicting the tSNE1-3 components, which indicated that TSP was grouped independently from other populations (Fig. S3).Notably, Adm1, represented by all individuals from the TSP, was distributed near the KGP/FGP and within the OEF clusters (Fig. 2C).These results suggest that part of the TSP was constructed using genetic components derived from OEF and unknown relative populations.In addition, parentage analysis based on SNP genotypes indicated that almost all TSP individuals were in full-sib or half-sib pairs (Fig. S2).This indicates that the TSP population is advancing to a higher level of inbreeding within close relatives.We have previously suggested that the TSP was constructed using a unique genetic component different from OEF based on mtDNA and SSR analyses.Notably, a large proportion of TSP individuals harbored Hap-c, a very rare mtDNA haplotype that has not been found in previous studies 5,12,13 .However, large-scale SNP genotyping based on RAD-seq revealed that TSP harbored a lower number of private alleles than the others, and some of these individuals were included in a cluster composed of OEF members (Table 1, Fig. 2A).The tSNE plot and parentage analysis showed that individuals with TSP shared the same genotypes at many loci within the population, indicating that they may have experienced a bottleneck in the past (Fig. 2B,C).Therefore, we predicted that the ancestors of the TSP population partially originated from individuals common to OEF.
Taking these results into consideration, we propose two hypotheses regarding the origins of TSP as follows: i) TSP lost minor alleles through the bottleneck effect and inbreeding, resulting in the deflection of allele frequencies, and then Hap-c, a minor haplotype in the Australian population, was extended into TSP by founder effects.
ii) The Admixture and PCoA analyses demonstrated that TSP was an independent population from the others, despite sharing some OEF ancestors.Accordingly, the origin of the TSP provisionally designated as ancTSP might be an isolated population in the Australian wild emus, as well as in the NNC.This hypothesis is supported by the presence of rare genetic components and an mtDNA haplotype, which have rarely been detected in other populations, including Australia 5,12,13 .Thus, it is possible that TSP partially originated from a rare Australian mainland population after the isolation of NNC, a putative origin of FGP/KGP, from the major populations (Fig. 1B).If this hypothesis is correct, it would explain the independence of TSP, despite harboring a few genetic components derived from OEF.The survival of ancTSP is highly desirable for the conservation of genetic resources as a cryptic population that has never been discovered in the emu.In conclusion, further analyses based on large-scale SNP genotyping of Australian wild and farmed populations, including extinct species, are needed.

Conclusions
In conclusion, the present study revealed the genetic structure within and the relationships among Japanese emu populations based on genome-wide SNP genotyping.The OEF is the largest population in Japan and has the highest genetic diversity among all the tested populations.FGP and KGP were clearly differentiated from OEF, suggesting that they originated from different populations in Australia.TSP may be composed of a common ancestor of OEF and its unique genetic component is caused by the loss of genetic diversity due to inbreeding.Alternatively, the TSP may be independent of the Japanese farmed population derived from rare Australian individuals.To the best of our knowledge, our study is the first to predict the genetic relationships among populations of Japanese farmed emus and their origins.These findings would provide valuable insights not only for effective emu breeding to extend genetic diversity in Japan but also for the conservation of genetic resources in wild and farmed Australian emus.

Figure 1 .
Figure 1.Phylogenetic trees of Japanese farmed emu populations based on SNP genotypes.(A) Phylogenetic tree among four populations based on F ST calculated using SNP genotypes.F ST values were indicated in each branch.(B) Phylogenetic trees of 253 emu individuals based on the Beast software calculated using SNP genotypes.The yellow, blue, and orange node (A, B) and blanch (B) colors indicated OEF, KGP/FGP, and TSP, respectively.OEF, Okhotsk Emu Farm; TSP, Tohoku Safari Park; FGP, Fuji Kachoen Garden Park; KGP, Kakegawa Kachoen Garden Park.

Figure 2 .
Figure 2. Genetic structure and relationships among Japanese emu farmed populations.(A) ADMIXTURE analysis of four emu populations based on SNP genotype data.Each genotyped emu is indicated by a single vertical line divided into K colors (red, green, ice blue, and purple), where K is the putative number of ancestors.(B) tSNE plot of 253 emu individuals based on SNP data.The five groups were bundled by different colored polygons.Gray (triangles), yellow (cross marks), ice blue (snowflakes), blue (circles), and orange (squares) represent OEF2016, OEF2017, FGP, KGP, and TSP, respectively.(C) tSNE pie plot of 253 emu individuals based on SNP genotype data.Each individual points showed pie chart of divided genetic cluster via Admixture analysis (A).Colored outer boundary of pie charts by gray, yellow, ice blue, blue, and orange indicated individuals derived from OEF2016, OEF2017, FGP, KGP, and TSP, respectively.OEF, Okhotsk Emu Farm; TSP, Tohoku Safari Park; FGP, Fuji Kachoen Garden Park; KGP, Kakegawa Kachoen Garden Park.